Method and system for multimedia communication control

ABSTRACT

A multipoint control unit (MCU) or other digital video-processing apparatus operates to manipulate compressed digital video from several compressed digital video sources. The apparatus has a plurality of video input modules and a plurality of video output module. Each of the video input modules receives a compressed video signal from one of the sources and generally decodes the data into a primary data stream and a secondary data stream. The video output module receives the primary and secondary data streams, from at least one of the input module for generally encoding to a compressed output stream for transmission.

RELATED APPLICATIONS

This application is a continuation of U.S. patent application Ser. No.09/506,861 filed on Jan. 13, 2000 now U.S. Pat. No. 6,300,973 and claimsthe benefit of the filing date for the same.

BACKGROUND

In video communication, e.g., video conferencing, Multipoint ControlUnits (“MCU's”) serve as switches and conference builders for thenetwork. The MCU's receive multiple audio/video streams from the varioususers' terminals, or codecs, and transmit to the various users'terminals audio/video streams that correspond to the desired signal atthe users' stations. In some cases, where the MCU serves as aswitchboard, the transmitted stream to the end terminal is a simplestream from a single other user. In other cases, it is a combined“conference” stream composed of a combination of several users' streams.

An important function of the MCU is to translate or manipulate the inputstreams into the desired output streams from all and to all codecs. Oneaspect of this “translation” is a modification of the bit-rate betweenthe original stream and the output stream. This rate matchingmodification can be achieved, for example, by changing the frame rate,the spatial resolution, or the quantization accuracy of thecorresponding video. The output bit-rate, and thus the modified factorused to achieve the output bit rate, can be different for differentusers, even for the same input stream. For instance, in a four partyconference, one of the parties may be operating at 128 Kbps, another at256 Kbps, and two others at T1. Each party needs to receive thetransmission at the appropriate bit rate. The same principles apply to“translation,” or transcoding, between parameters that vary betweencodecs, e.g., different coding standards like H.261/H263; differentinput resolutions; and different maximal frame rates in the inputstreams.

Another use of the MCU can be to construct an output stream thatcombines several input streams. This option, sometimes called“compositing” or “continuous presence,” allows a user at a remoteterminal to observe, simultaneously, several other participants in theconference. The choice of these participants can vary among differentusers at different remote terminals of the conference. In thissituation, the amount of bits allocated to each participant can alsovary, and may depend on the on screen activity of the users, on thespecific resolution given to the participant, or some other criterion.

All of this elaborate processing, e.g., transcoding and continuouspresence processing, must be done under the constraint that the inputstreams are already compressed by a known compression method, usuallybased on a standard like ITU's H.261 or H.263. These standards, as wellas other video compression standards like MPEG, are generally based on aDiscrete Cosine Transform (“DCT”) process wherein the blocks of theimage (video frame) are transformed, and the resulting transformcoefficients are quantized and coded.

One prior art method first decompresses the video streams; performs therequired combination, bridging and image construction; and finallyre-compresses the video streams for transmission. This method requireshigh computation power, leads to degradation in the resulting videoquality and suffers from large propagation delay. One of the mostcomputation intensive portions of the prior art methods is the encodingportion of the operation where such things as motion vectors and DCTcoefficients have to be generated so as to take advantage of spatial andtemporal redundacies. For instance, to take advantage of spatialredundancies in the video picture, the DCT function can be perfomed. Togenerate DCT coefficients, each frame of the picture is broken intoblocks and the discrete cosine transform function is performed upon eachblock. In order to take advantage of temporal redundancies, motionvectors can be generated. To generate motion vectors, consecutive framesare compared to each other in an attempt to discern pattern movementfrom one frame to the next. As would be expected, these computationsrequire a great deal of computing power.

In order to reduce computation complexity and increase quality, othershave searched for methods of performing such operations in a moreefficient manner. Proposals have included operating in the transformdomain on motion compensated, DCT compressed video signals by removingthe motion compensation portion and compositing in the DCT transformdomain.

Therefore, a method is needed for performing the “translation”operations of an MCU, such as modifying bit rates, frame rates, andcompression algorithms in an efficient manner that reduces propagationdelays, degradation in signal quality, video bandwidth use within theMCU and computational complexity.

SUMMARY

The present invention relates to an improved method of processingmultimedia/video data in an MCU or other digital video processing device(VPD). By reusing information embedded in a compressed video streamreceived from a video source, the VPD can improve the quality and reducethe total computations needed to process the video data before sendingit to the destination. More specifically, the present invention operatesto manipulate compressed digital video from several compressed digitalvideo sources. A video input module receives compressed video input datafrom a video source. A generalized decoder within the video input moduledecodes the compressed video input data and generates a primary videodata stream. The generalized decoder also processes the compressed videoinput data and the primary video data stream to generate a secondarydata stream. A video output module, which includes a rate control unitand a generalized encoder, receives the primary video data stream andthe secondary data stream from at least one input module. Thegeneralized encoder, in communication with the rate control unit,receives the primary video data from one or more input modules andencodes the primary video data into combined compressed video outputdata. The use of the secondary data stream by the output module improvesthe speed of encoding and the quality of the compressed video data.

FIGURES

The construction designed to carry out the invention will hereinafter bedescribed, together with other features thereof. The invention will bemore readily understood from a reading of the following specificationand by reference to the accompanying drawings forming a part thereof,wherein an example of the invention is shown and wherein:

FIG. 1 illustrates a system block diagram for implementation of anexemplary embodiment of the general function of this invention.

FIG. 2 illustrates a block diagram of an exemplary embodiment of ageneralized decoder.

FIG. 3 illustrates a block diagram of another exemplary embodiment of ageneralized decoder.

FIG. 4 illustrates a block diagram of an exemplary embodiment of ageneralized encoder/operating in the spatial domain.

FIG. 5 illustrates a block diagram of an exemplary embodiment of ageneralized encoder/operating in the DCT domain.

FIG. 6 illustrates an exemplary embodiment of a rate control unit foroperation with an embodiment of the present invention.

FIG. 7 is a flow diagram depicting exemplary steps in the operation of arate control unit.

FIG. 8 illustrates an exemplary embodiment of the present inventionoperating within an MCU wherein each endpoint has a single dedicatedvideo output module and a plurality of dedicated video input modules.

FIG. 9 illustrates an exemplary embodiment of the present inventionhaving a single video input module and a single video output module perlogical unit.

DETAILED DESCRIPTION

An MCU is used where multiple users at endpoint codecs communicate in asimultaneous video conference. A user at a given endpoint maysimultaneously view multiple endpoint users at his discretion. Inaddition, the endpoints may communicate at differing data rates usingdifferent coding standards, so the MCU facilitates transcoding of thevideo signals between these endpoints.

FIG. 1 illustrates a system block diagram for implementation of anexemplary embodiment of the general function of the invention. In anMCU, compressed video input 115 from a first endpoint codec is broughtinto a video input module 105, routed through a common interface 150,and directed to a video output module 110 for transmission as compressedvideo output 195 to a second endpoint codec. The common interface mayinclude any of a variety of interfaces, such as shared memory, ATM bus,TDM bus, switching and direct connect. The invention contemplates thatthere will be a plurality of endpoints enabling multiple users toparticipate in a video conference. For each endpoint, a video inputmodule 105 and a video output module 110 may be assigned. Commoninterface 150 facilitates the transfer of video information betweenmultiple video input modules 105 and multiple video output modules 110.

Compressed Video 115 is sent to error correction decoder block 117within video input module 105. Error correction decoder block 117 takesthe incoming compressed video 115 and removes the error correction code.An example of an error correction code is BCH coding. This errorcorrection decoder block 117 is optional and may not be needed withcertain codecs.

The video stream is next routed to the variable length unencoder, VLC⁻¹120, for decoding the variable length coding usually present within thecompressed video input stream. Depending on the compression used (H.261,H.263, MPEG etc.) it recognizes the stream header markers and thespecific fields associated with the video frame structure. Although themain task of the VLC⁻¹ 120 is to decode this variable length code andprepare the data for the following steps, VLC⁻¹ 120 may take some of theinformation it receives, e.g., stream header markers and specific fieldinformation, and pass this information on to later function blocks inthe system.

The video data of the incoming stream contains quantized DCTcoefficients. After decoding the variable length code, Q⁻¹ 125dequantizes the representation of these coefficients to restore thenumerical value of the DCT coefficients in a well known manner. Inaddition to dequantizing the DCT coefficients, Q⁻¹ 125 may pass throughsome information, such as the step size, to other blocks for additionalprocessing.

Generalized decoder 130 takes the video stream received from the VLC⁻¹120 through Q⁻¹ 125 and based on the frame memory 135 content, convertsit into “generalized decoded” frames (according to the domain chosen fortranscoding). The generalized decoder 130 then generates two streams: aprimary data stream and a secondary data stream. The primary data streamcan be either frames represented in the image (spatial) domain, framesrepresented in the DCT domain, or some variation of these, e.g., errorframes. The secondary data stream contains “control” or “sideinformation” associated with the primary stream and may contain motionvectors, quantizer identifications, coded/uncoded decisions,filter/non-filter decisions frame type, resolution, and otherinformation that would be useful to the encoding of a video signal.

For example, for every macro block, there may be an associated motionvector. Reuse of the motion vectors can reduce the amount ofcomputations significantly. Quantizer values are established prior tothe reception of encoded video 115. Reuse of quantizer values, whenpossible, can allow generalized encoder 170 to avoid quantization errorsand send the video coefficients in the same form as they entered thegeneralized decoder 130. This configuration avoids quality degradation.In other cases, quantizer values may serve as first guesses during thereencoding process. Statistical information can be sent from thegeneralized decoder 130 over the secondary data stream. Such statisticalinformation may include data about the amount of information within eachmacroblock of an image. In this way, more bits may later be allocated byrate control unit 180 to those macroblocks having more information.

Because filters may be used in the encoding process, extraction offilter usage information in the generalized decoder 130 also can reducethe complexity of processing in the generalized encoder 170. While theuse of filters in the encoding process is a feature of the H.261standard, it will be appreciated that the notion of the reuse of filterinformation should be read broadly to include the reuse of informationused by other artifact removal techniques.

In addition, the secondary data stream may contain decisions made byprocessing the incoming stream, such as image segmentation decisions andcamera movements identification. Camera movements include such data aspan, zoom and other general camera movement information. By providingthis information over the secondary data stream, the generalized encoder170 may make a better approximation when re-encoding the picture byknowing that the image is being panned or zoomed.

This secondary data stream is routed over the secondary (SideInformation) channel 132 to the rate control unit 180 for use in videooutput block 110. Rate control unit 180 is responsible for the efficientallocation of bits to the video stream in order to obtain maximumquality while at the same time using the information extracted fromgeneralized decoder 130 within the video input block 105 to reduce thetotal computations of the video output module 110.

The scaler 140 takes the primary data stream and scales it. The purposeof scaling is to change the frame resolution in order to laterincorporate it into a continuous presence frame. Such a continuouspresence frame may consist of a plurality of appropriately scaledframes. The scaler 140 also applies proper filters for both decimationand picture quality preservation. The scaler 140 may be bypassed if thescaling function is not required in a particular implementation orusage.

The data formatter 145 creates a representation of the video stream.This representation may include a progressively compressed stream. In aprogressively compressed stream, a progressive compression technique,such as wavelet based compression, represents the video image in anincreasing resolution pyramid. Using this technique, the scaler 140 maybe avoided and the data analyzer and the editor 160, may take from thecommon interface only the amount of information that the editor requiresfor the selected resolution.

The data formatter 145 facilitates communication over the commoninterface and assists the editor 160 in certain embodiments of theinvention. The data formatter 145 may also serve to reduce the bandwidthrequired of the common interface by compressing the video stream. Thedata formatter 145 may be bypassed if its function is not required in aparticular embodiment.

When the formatted video leaves data formatter 145 of the video inputblock, it is routed through common interface 150 to the data analyzer155 of video output block 110. Routing may be accomplished throughvarious means including busses, switches or memory.

The data analyzer 155 inverts the representation created by the dataformatter 145 into a video frame structure. In the case of progressivecoding, the data analyzer 155 may take only a portion of the generatedbit-stream to create a reduced resolution video frame. In embodimentswhere the data formatter 145 is not present or is bypassed, the dataanalyzer 155 is not utilized.

After the video stream leaves the data analyzer 155, the editor 160 cangenerate the composite video image. It receives a plurality of videoframes; it may scale the video frame (applying a suitable filter fordecimation and quality), and/or combine various video inputs into onevideo frame by placing them inside the frame according to a predefinedor user defined screen layout scheme. The editor 160 may receiveexternal editor inputs 162 containing layout preferences or textrequired to be added to the video frame, such as speech translation,menus, or endpoint names. The editor 160 is not required and may bebypassed or not present in certain embodiments not requiring thecompositing function.

The rate control unit 180 controls the bit rate of the outgoing videostream. The rate control operation is not limited to a single stream andcan be used to control multiple streams in an embodiment comprising aplurality of video input modules 105. The rate control and bitallocation decisions are made based on the activities and desiredquality for the output stream. A simple feedback mechanism that monitorsthe total amount of bits to all streams can assist in these decisions.In effect, the rate control unit becomes a statistical multiplexer ofthese streams. In this fashion, certain portions of the video stream maybe allocated more bits or more processing effort.

In addition to the feedback from generalized encoder 170, feedback fromVLC 190, and side information from the secondary channel 132, as well asexternal input 182 all may be used to allow a user to select certainaspects of signal quality. For instance, a user may choose to allocatemore bits of a video stream to a particular portion of an image in orderto enhance clarity of that portion. The external input 182 is abi-directional port to facilitate communications from and to an externaldevice.

In addition to using the side information from the secondary channel 132to assist in its rate control function, rate control unit 180 may,optionally, merely pass side information directly to the generalizedencoder 170. The rate control unit 180 also assists the quantizer 175with quantizing the DCT coefficients by identifying the quantizer to beused.

Generalized encoder 170 basically performs the inverse operation of thegeneralized decoder 130. The generalized encoder 170 receives twostreams: a primary stream, originally generated by one or moregeneralized decoders, edited and combined by the editor 160; and asecondary stream of relevant side information coming from the respectivegeneralized decoders. Since the secondary streams generated by thegeneralized decoders are passed to the rate-control function 180, thegeneralized encoder 170 may receive the side information through therate control function 180 either in its original form or after beingprocessed. The output of the generalized encoder 170 is a stream of DCTcoefficients and additional parameters ready to be transformed into acompressed stream after quantization and VLC.

The output DCT coefficients from the generalized encoder 170 arequantized by Q₂ 175, according to a decision made by the rate controlunit 180. These coefficients are fed back to the inverse quantizer blockQ₂ ⁻¹ 185 to generate as a reference a replica of what the decoder atthe endpoint codec would obtain. This reference is typically the sum ofthis feedback and the content of the frame memory 165. This process isaimed to avoid error propagation. Now, depending on the domain used forencoding, the difference between the output of the editor 160 and themotion compensated reference (calculated either in the DCT or spatialdomain) is encoded into DCT coefficients which are the output of thegeneralized encoder 170.

The VLC 190, or variable length coder, removes the remainingredundancies from the quantized DCT coefficients stream by usinglossless coding tables defined by the chosen standard (H.261, H.263 . .. ). VLC 190 also inserts the appropriate motion vectors, the necessaryheaders and synchronization fields according to the chosen standard. TheVLC 190 also sends to the Rate Control Unit 180 the data on the actualamount of bits used after variable length coding.

The error correction encoder 192 next receives the video stream andinserts the error correction code. In some cases this may be BCH coding.This error correction encoder 192 block is optional and, depending onthe codec, may be bypassed. Finally, it sends the stream to the end usercodec for viewing.

In order to more fully describe aspects of the invention, further detailon the generalized decoder 130 and the generalized encoder 170 follows.

FIG. 2 illustrates a block diagram of an exemplary embodiment of ageneralized decoder 130. Dequantized video is routed from thedequantizer 125 to the Selector 210 within the generalized decoder 130.The Selector 210 splits the dequantized video stream, sending the streamto one or more data processors 220 and a spatial decoder 230. The dataprocessors 220 calculate side information, such as statisticalinformation like pan and zoom, as well as quantizer values and the like,from the video stream. The data processors 220 then pass thisinformation to the side information channel 132. A spatial decoder 230,in conjunction with frame memory 135 (shown in FIG. 1) fully orpartially decodes the compressed video stream. The DCT decoder 240,optionally, performs the inverse of the discrete cosine transferfunction. The motion compensator 250, optionally, in conjunction withframe memory 135 (shown in FIG. 1) uses the motion vectors as pointersto a reference block in the reference frame to be summed with theincoming residual information block. The fully or partially decodedvideo stream is then sent along the primary channel to the scaler 140,shown in FIG. 1, for further processing. Side Information is transferredfrom spatial decoder 230 via side channel 132 for possible reuse at ratecontrol unit 180 and generalized encoder 170.

FIG. 3 illustrates a block diagram of another exemplary embodiment of ageneralized decoder 130. Dequantized video is routed from dequantizer125 to the selector 210 within generalized decoder 130. The selector 210splits the dequantized video stream sending the stream to one or moredata processors 320 and DCT decoder 330. The data processors 320calculate side information, such as statistical information like pan andzoom, as well as quantizer values and the like, from the video stream.The data processors 320 then pass this information through the sideinformation channel 132. The DCT decoder 330 in conjunction with theframe memory 135, shown in FIG. 1, fully or partially decodes thecompressed video stream using a DCT domain motion compensator 340 whichperforms, in the DCT domain, calculations needed to sum the referenceblock pointed to by the motion vectors in the DCT domain reference framewith the residual DCT domain input block. The fully or partially decodedvideo stream is sent along the primary channel to the scaler 140, shownin FIG. 1, for further processing. Side Information is transferred fromthe DCT decoder 330 via the side channel 132 for possible reuse at therate control unit 180 and the generalized encoder 170.

FIG. 4 illustrates a block diagram of an exemplary embodiment of ageneralized encoder 170 operating in the spatial domain. The generalizedencoder's first task is to determine the motion associated with eachMacroBlock (MB) of the received image over the primary data channel fromthe editor 160. This is performed by the enhanced motion estimator 450.The enhanced motion estimator 450 receives motion predictors thatoriginate in the side information, processed by the rate controlfunction 180 and sent through the encoder manager 410 to the enhancedmotion estimator 450. The enhanced motion estimator 450 compares, ifneeded, the received image with the reference image that exists in theframe memory 165 and finds the best motion prediction in the environmentin a manner well known to those skilled in the art. The motion vectors,as well as a quality factor associated with them, are then passed to theencoder manager 410. The coefficients are passed on to the MB processor460.

The MB processor 460 is a general purpose processing unit for themacroblock level wherein one of its many functions is to calculate thedifference MB. This is done according to an input coming from theencoder manager 410, in the form of indications whether to code the MBor not, whether to use a de-blocking filter or not, and other videoparameters. In general, responsibility of the MB processor 460 is tocalculate the macroblock in the form that is appropriate fortransformation and quantization. The output of the MB processor 460 ispassed to the DCT coder 420 for generation of the DCT coefficients priorto quantization.

All these blocks are controlled by the encoder manager 410. It decideswhether to code or not to code a macroblock; it may decide to use somedeblocking filters; it gets quality results from the enhanced motionestimator 450; it serves to control the DCT coder 420; and it serves asan interface to the rate-control block 180. The decisions and controlmade by the encoder manager 410 are subject to the input coming from therate control block 180.

The generalized encoder 170 also contains a feedback loop. The purposeof the feedback loop is to avoid error propagation by reentering theframe as seen by the remote decoder and referencing it when encoding thenew frame. The output of the encoder which was sent to the quantizationblock is decoded back by using an inverse quantization block, and thenfed back to the generalized encoder 170 into the inverse DCT 430 andmotion compensation blocks 440, generating a reference image in theframe memory 165.

FIG. 5 illustrates a block diagram of a second exemplary embodiment of ageneralized encoder 170 operating in the DCT domain. The generalizedencoder's first task is to determine the motion associated with eachmacroblock of the received image over the primary data channel from theeditor 160. This is performed by the DCT domain enhanced motionestimator 540. The DCT domain enhanced motion estimator 540 receivesmotion predictors that originate in the side information channel,processed by rate control function 180 and sent through the encodermanager 510 to the DCT domain enhanced motion estimator 540. Itcompares, if needed, the received image with the DCT domain referenceimage that exists in the frame memory 165 and finds the best motionprediction in the environment. The motion vectors, as well as a qualityfactor associated with them, are then passed to the encoder manager 510.The coefficients are passed on to the DCT domain MB processor 520.

The DCT domain macroblock, or MB, processor 520 is a general purposeprocessing unit for the macroblock level, wherein one of its manyfunctions is to calculate the difference MB in the DCT domain. This isdone according to an input coming from the encoder manager 510, in theform of indications whether to code the MB or not, to use a de-blockingfilter or not, and other video parameters. In general, the DCT domain MBprocessor 520 responsibility is to calculate the macroblock in the formthat is appropriate for transformation and quantization.

All these blocks are controlled by the encoder manager 510. The encodermanager 510 decides whether to code or not to code a macroblock; it maydecide to use some deblocking filters; it gets quality results from theDCT domain enhanced motion estimator 540; and it serves as an interfaceto the rate control block 180. The decisions and control made by theencoder manager 510 are subject to the input coming from the ratecontrol block 180.

The generalized encoder 170 also contains a feedback loop. The output ofthe encoder which was sent to the quantization block is decoded back, byusing an inverse quantization block and then fed back to the DCT domainmotion compensation blocks 530, generating a DCT domain reference imagein the frame memory 165.

While the generalized encoder 170 has been described with reference to aDCT domain configuration and a spatial domain configuration, it will beappreciated by those skilled in the art that a single hardwareconfiguration may operate in either the DCT domain or the spatialdomain. This invention is not limited to either the DCT domain or thespatial domain but may operate in either domain or in the continuumbetween the two domains.

FIG. 6 illustrates an exemplary embodiment of a rate control unit foroperation with an embodiment of the present invention. Exemplary ratecontrol unit 180 controls the bit rate of the outgoing video stream. Aswas stated previously, the rate control operation can apply jointtranscoding of multiple streams. Bit allocation decisions are made basedon the activities and desired quality for the various streams assistedby a feedback mechanism that monitors the total amount of bits to allstreams. Certain portions of the video stream may be allocated more bitsor more processing time.

The rate control unit 180 comprises a communication module 610, a sideinformation module 620, and a quality control module 630. Thecommunication module 610 interfaces with functions outside of the ratecontrol unit 180. The communication module 610 reads side informationfrom the secondary channel 132, serves as a two-way interface with theexternal input 182, sends the quantizer level to a quantizer 175, readsthe actual number of bits needed to encode the information from the VLC190, and sends instructions and data and receives processed data fromthe generalized encoder 170.

The side information module 620 receives the side information from allappropriate generalized decoders from the communication module 610 andarranges the information for use in the generalized encoder. Parametersgenerated in the side information module 620 are sent via communicationmodule 610 for further processing in the general encoder 170.

The quality control module 630 controls the operative side of the ratecontrol block 180. The quality control module 630 stores the desired andmeasured quality parameters. Based on these parameters, the qualitycontrol module 630 may instruct the side information module 620 or thegeneralized encoder 170 to begin certain tasks in order to refine thevideo in parameters.

Further understanding of the operation of the rate control module 180will be facilitated by referencing the flowchart shown in FIG. 7. Whilethe rate control unit 180 can perform numerous functions, theillustration of FIG. 7 depicts exemplary steps in the operation of arate control unit such as rate control unit 180. The context of thisdescription is the reuse of motion vectors; in practice those skilled inthe art will appreciate that other information can be exploited in asimilar manner. The method depicted in FIG. 7 at step 705, thecommunications module 610 within the rate control unit 180 readsexternal instructions for the user desired picture quality and framerate. At step 710, communications module 610 reads the motion vectors ofthe incoming frames from all of the generalized decoders that aresending picture data to the generalized encoder. For example if thegeneralized encoder is transmitting a continuous presence image from sixincoming images, motion vectors from the six incoming images are read bythe communications module 610. Once the motion vectors are read by thecommunications module 610, they are transferred to the side informationmodule 620.

At step 715, the quality control module 630 instructs the sideinformation module 620 to calculate new motion vectors using the motionvectors that were retrieved from the generalized decoders and stored, atstep 710, in the side information module 620. The new motion vectors mayhave to be generated for a variety of reasons including reduction offrame hopping and down scaling. In addition to use in generating newmotion vectors, the motion vectors in the side information module areused to perform error estimation calculations with the result being usedfor further estimations or enhanced bit allocation. In addition, themotion vectors give an indication of a degree of movement within aparticular region of the picture or region of interest, so that the ratecontrol unit 180 can allocate more bits to blocks in that particularregion.

At step 720, the quality control module 630 may instruct the sideinformation module 620 to send the new motion vectors to the generalizedencoder via the communications module 610. The generalized encoder maythen refine the motion vectors further. Alternatively, due toconstraints in processing power or a decision by the quality controlmodule 630 that refinement is unnecessary, motion vectors may not besent. At step 725, the generalized encoder will search for improvedmotion vectors based on the new motion vectors. At step 730, thegeneralized encoder will return these improved motion vectors to thequality control module 630 and will return information about the frameand/or block quality.

At step 735, the quality control module 630 determines the quantizationlevel parameters and the temporal reference and updates the externaldevices and user with this quantizator and temporal information. At step740, the quality module 630 sends the quantization parameters to thequantizer 175. At step 745, the rate control unit 180 receives the bitinformation from the VLC 190 which informs the rate control unit 180 ofthe number of bits used to encode each frame or block. At step 750, inresponse to this information, the quality control module 630 updates itsobjective parameters for further control and processing and returns toblock 710.

The invention described above may be implemented in a variety ofhardware configurations. Two such configurations are the “fat port”configuration generally illustrated in FIG. 8 and the “slim port”configuration generally illustrated in FIG. 9. These two embodiments arefor illustrative purposes only, and those skilled in the art willappreciate the variety of possible hardware configurations implementingthis invention.

FIG. 8 illustrates an exemplary embodiment of the present inventionoperating within an MCU, wherein each endpoint has a single dedicatedvideo output module 110 and a plurality of dedicated video input modules105. In this so called “fat port” embodiment, a single logical unitapplies all of its functionality for a single endpoint. Incoming videostreams are directed from the Back Plane Bus 800 to a plurality of videoinput modules 105. Video inputs from the Back Plane Bus 800 are assignedto a respective video input module 105. This exemplary embodiment ismore costly than the options that follow because every endpoint in an nperson conference requires n−1 video input modules 105 and one videooutput module 110. Thus, a total of n·(n−1) video input modules and nvideo output modules are needed. While costly, the advantage is that endusers may allocate the layout of their conference to their liking. Inaddition to this “private layout” feature, having all of the video inputmodules and the video output module on the same logical unit permits adedicated data pipe 850 that resides within the logical unit tofacilitate increased throughput. The fact that this data pipe 850 isinternal to a logical unit eases the physical limitation found whenmultiple units share the pipe. The dedicated data pipe 850 can containpaths for both the primary data channel and the side informationchannel.

FIG. 9 illustrates an exemplary embodiment of the present invention witha single video input module and a single video output module per logicalunit. In an MCU in this “Slim Port” configuration, a video input module105 receives a single video input stream from Back Plane Bus 800. Afterprocessing, the video input stream is sent to common interface 950 whereit may be picked up by another video output module for processing. Videooutput module 110 receives multiple video input streams from the commoninterface 950 for compilation in the editor and output to the Back PlaneBus 800 where it will be routed to an end user codec. In this embodimentof the invention, the video output module 110 and video input module 105are on the same logical unit and may be dedicated to serving theinput/output video needs of a single end user codec, or the video inputmodule 105 and the video output module 110 may be logically assigned asneeded. In this manner, resources may be better utilized; for example,for a video stream of an end user that is never viewed by other endusers, there is no need to use a video input module resource.

Because of the reduction in digital processing caused by the presentarchitecture, including this reuse of video parameters, the video inputmodules 105 and the video output modules 110 can use microprocessorslike digital signal processors (DSP's) which can be significantly moreversatile and less expensive than the hardware required for prior artMCU's. Prior art MCU's that perform full, traditional decoding andencoding of video signals typically require specialized video processingchips. These specialized video processing chips are expensive, “blackbox” chips that are not amenable to rapid development. Their specializednature means that they have a limited market that does not facilitatethe same type of growth in speed and power as has been seen in themicroprocessor and digital signal processor (“DSP”) field. By reducingthe computational complexity of the MCU, this invention facilitates theuse of fast, rapidly evolving DSP's to implement the MCU features.

From the foregoing description, it will be appreciated that the presentinvention describes a method of and apparatus for performing operationson a compressed video stream. The present invention has been describedin relation to particular embodiments which are intended in all respectsto be illustrative rather than restrictive. Alternative embodiments willbecome apparent to those skilled in the art to which the presentinvention pertains without departing from its spirit and scope.Accordingly, the scope of the present invention is described by theappended claims and supported by the foregoing description.

What is claimed is:
 1. An apparatus for manipulating compressed digitalvideo information to form manipulated compressed video information, themanipulated compressed video information being a manipulation of datafrom at least one of a plurality of compressed digital video sources,the apparatus comprising: at least one video input module for receivingcompressed video input data from at least one source of the plurality ofcompressed digital video sources, the at least one video input modulecomprising a generalized decoder operative to decode the compressedvideo input data, generate a primary video data stream, and process thecompressed video input data and the primary video data stream togenerate a secondary data stream; and at least one video output modulefor receiving the primary video data stream and the secondary datastream from the at least one video input module, and being operative toencode the primary video data stream with references to the secondarydata stream to form manipulated compressed video output data, wherebythe use of the secondary data stream by the at least one video outputmodule improves a speed of encoding and the manipulated compressed videooutput data's quality.
 2. The apparatus of claim 1, wherein the videooutput module comprises: a rate control unit; and a generalized encoder,in communication with the rate control unit and operative to receive theprimary video data stream, having primary video data, from the at leastone video input module and encode the primary video data into themanipulated compressed video output data.
 3. The apparatus of claim 2,wherein the rate control unit comprises: means to read the secondarydata stream; means to process the secondary data stream; and means tocontrol the generalized encoder based upon results of processing thesecondary data stream.
 4. The apparatus of claim 2, wherein the ratecontrol unit comprises: means to read feedback data from the generalizedencoder; means to process the secondary data stream with the feedbackdata; and means to control the generalized encoder based upon results ofprocessing the secondary data stream with the feedback data.
 5. Theapparatus of claim 4, wherein the secondary data stream comprises sideinformation which further comprises at least one type of informationselected from a group consisting of: frame type, resolution, motionvectors, filter usage indication, quantizer identifications,coded/uncoded decisions, the amount of information within eachmacroblock, image segmentation indication, scene cut off indication,camera zoom identification, camera pan identification, camera movementsidentification, and statistical information.
 6. The apparatus of claim2, wherein the secondary data stream is associated with a primary datastream to form a n associated secondary data stream, and the at leastone video output module receives the primary video data stream, theassociated secondary data stream and control information from anexternal device.
 7. The apparatus of claim 6, wherein the rate controlunit of the video output module comprises: means to read the secondarydata stream; means to read the control information; means to process thesecondary data stream; means to process the control information; andmeans to control the generalized encoder based upon results ofprocessing the secondary data stream and results of processing thecontrol information.
 8. The apparatus of claim 7, wherein the ratecontrol unit of the at least one video output module comprises: means toread feedback data from the generalized encoder; means to process thesecondary data stream with the control information and the feedbackdata; and means to control the generalized encoder based upon results ofprocessing the secondary data stream with the control information andthe feedback data.
 9. The apparatus of claim 6, wherein the controlinformation includes at least one type of information selected from agroup consisting of: region of interest indications, screen layoutrequirements, user quality preferences, and special effects.
 10. Theapparatus of claim 6, wherein the control information is bi-directionalinformation.
 11. The apparatus of claim 2, wherein the at least onevideo output module receives the primary video data stream and thesecondary data stream, and the rate control unit of the at least onevideo output module comprises: means to read the secondary data stream;means to read data related to how many bits are used after variablelength coding; means to process the secondary data stream with variablelength coding information; and means to control the generalized encoderbased on results of processing the variable length coding information,whereby the use of the variable length coding information and thesecondary data stream by the generalized encoder improves a speed ofencoding and the compressed video output signal's quality by improvingan output bit allocation.
 12. The apparatus of claim 11 wherein the atleast one video output module receives the primary video data stream andthe secondary data stream, and the rate control unit of the at least onevideo output module comprises: means to read feedback data from thegeneralized encoder; means to process the secondary data stream with thevariable length coding information and the feedback data; and means tocontrol the generalized encoder based on results of processing thesecondary data stream with the variable length coding information andthe feedback data.
 13. The apparatus of claim 1, further comprising:means to route the primary video data from the at least one video inputmodule to the at least one video output module; and means to route thesecondary data stream from the at least one video input module to the atleast one video output module.
 14. The apparatus of claim 13, whereinthe means to route the primary video data stream includes a commoninterface selected from a group consisting of: shared memory, an ATMbus, a TDM bus, switching, and a direct connection.
 15. The apparatus ofclaim 13, wherein the means to route the secondary data stream includesa common interface selected from a group consisting of: shared memory,an ATM bus, a TDM bus, switching, and a direct connection.
 16. Theapparatus of claim 1, wherein the manipulation of the compressed videoinput data includes at least one type of manipulation selected from agroup consisting of: transcoding and compositing.
 17. The apparatus ofclaim 1, wherein the secondary data stream is associated with theprimary video data stream in that the secondary data stream includesside information.
 18. The apparatus of claim 1, wherein the compressedvideo input data includes at least one type of information selected froma group consisting of: frame type, resolution, motion vectors, filterindication, DCT coefficients and quantizer values.
 19. The apparatus ofclaim 1, wherein the primary video data stream includes information in aDCT domain.
 20. The apparatus of claim 1, wherein the primary video datastream includes information in a spatial domain.
 21. An apparatus formanipulating compressed digital video forming manipulated compresseddigital video, the manipulated compressed digital video being amanipulation of data from at least one of a plurality of compresseddigital video sources and destinations, the apparatus comprising: atleast one video input module, each video input module of the at leastone video input module being operative to receive a compressed videoinput signal that belongs to one of the compressed digital video sourcesdepending on the required manipulation, to decode the compressed videoinput signal for generating a decoded video data stream and to transferthe decoded video data stream to a common interface; at least one videooutput module, each video output module of the at least one video outputmodule being operative to grab the decoded video data stream from thecommon interface, to encode the decoded video data stream into acompressed video output stream, and to transfer the compressed videooutput stream to at least one destination of the plurality ofdestinations; and a common interface forming a temporary logicalconnection for routing the decoded video data stream from at least oneinput module to at least one output module; wherein there is nopermanent logical relation or connection between the at least one videoinput module and the at least one video output module, and the apparatushas a configuration in which the temporary logical connection depends onthe current needs of a current manipulation, whereby use of theconfiguration improves resources allocation of the apparatus.
 22. Theapparatus of claim 21, wherein the manipulation of the compressed videoinput data includes at least one type of manipulations selected from agroup consisting of: transcoding and compositing.
 23. A compressed videocombiner unit for generating a compressed digital video signal, which isa composition of plurality of compressed digital video sources, thecompressed video combiner unit comprising: at least one video inputmodule for receiving compressed video input data from at least onesource of the plurality of compressed digital video sources, the atleast one video input module further comprising a generalized decoderoperative to decode the compressed video input data and generate aprimary video data stream, the generalized decoder further comprising adata processing unit operative to process the compressed video inputdata and the primary video data stream to generate a secondary datastream, the secondary data stream having an association with the primaryvideo stream forming associated secondary data; at least one videooutput module operative to receive at least one of the primary videodata stream and the secondary data stream, the at least one video outputmodule further comprising a rate control unit, and a generalizedencoder, in communication with the rate control unit and operative toreceive the primary video data from the at least one video input moduleand encode the primary video data into compressed video output data;means to route the primary video data from the at least one video inputmodule to the at least one video output module; and means to route thesecondary data stream from the at least one video input module to the atleast one video output module; whereby the use of the secondary datastream by the at least one video output module improves a speed ofencoding and the compressed video output data's quality.
 24. Thecompressed video combiner unit of claim 23, wherein the associationbetween the secondary data stream and the primary video data stream isthat the secondary data stream includes side information.
 25. Thecompressed video combiner unit of claim 24, wherein the side informationincludes at least one type of information selected from a groupconsisting of: frame type, resolution, motion vectors, filter usageindication, quantizer identifications, coded/uncoded decisions, anamount of information within each macroblock, image segmentationindication, scene cut off indication, camera zoom identification, camerapan identification, camera movements identification, and statisticalinformation.
 26. The compressed video combiner unit of claim 23, whereinthe compressed video input data includes at least one type ofinformation selected from a group consisting of: frame type, resolution,motion vectors, filter indication, DCT coefficients, and quantizervalues.
 27. The compressed video combiner unit of claim 23, wherein therate control unit comprises: means to read the secondary data stream;means to process the secondary data stream; and means to control thegeneralized encoder based upon results of processing the secondary datastream.
 28. The compressed video combiner unit of claim 23, wherein therate control unit comprises: means to read feedback data from thegeneralized encoder; means to process the secondary data stream with thefeedback data; and means to control the generalized encoder based uponresults of processing the secondary data stream with the feedback data.29. The compressed video combiner unit of claim 23, wherein the means toroute the primary video data stream includes a common interface selectedfrom a group consisting of: shared memory, an ATM bus, a TDM bus,switching, and a direct connection.
 30. The compressed video combinerunit of claim 23, wherein the means to route the secondary data streamincludes a common interface selected from a group consisting of: sharedmemory, an ATM bus, a TDM bus, switching, and a direct connection. 31.The compressed video combiner unit of claim 23, wherein the primaryvideo data stream includes information in a DCT domain.
 32. Thecompressed video combiner of claim 23, wherein the primary video datastream includes information in a spatial domain.
 33. The compressedvideo combiner unit of claim 23, wherein the video output modulereceives at least one of the primary video data streams, the associatedsecondary data stream, and control information from an external device.34. The compressed video combiner unit of claim 33, wherein the ratecontrol unit of the video output module comprises: means to read thesecondary data stream; means to read the control information; means toprocess the secondary data stream; means to process the controlinformation; and means to control the generalized encoder based uponresults of processing secondary data stream with results of processingcontrol information.
 35. The compressed video combiner unit of claim 34,wherein the rate control unit of the video output module comprises:means to read feedback data from the generalized encoder; means toprocess the secondary data stream with the control information and thefeedback data; and means to control the generalized encoder based uponresults of processing the secondary data stream with the controlinformation and the feedback data.
 36. The compressed video combinerunit of claim 33, wherein the control information includes at least onetype of information selected from a group consisting of: a region ofinterest indication, screen layout requirements, user qualitypreferences, and special effects.
 37. The compressed video combiner unitof claim 33, wherein the control information is bi-directionalinformation.
 38. The compressed video combiner unit of claim 23 whereinthe at least one video output module receives the primary video datastream and the secondary data stream, and the rate control unit of theat least one video output module comprises: means to read the secondarydata stream; means to read data related to how many bits are in useafter variable length coding; means to process the secondary data streamwith the variable length coding information; and means to control thegeneralized encoder based on results of processing the secondary datastream with the variable length coding information, whereby the use ofthe variable length coding information and the secondary data stream bythe generalized encoder improves a speed of encoding and the compressedvideo output signal's quality by improving an output bit allocation. 39.The compressed video combiner unit of claim 38 wherein the at least onevideo output module receives the primary video data stream and thesecondary data stream, and the rate control unit of the at least onevideo output module comprises: means to read feedback data from thegeneralized encoder; means to process the secondary data stream with thevariable length coding information and the feedback data; and means tocontrol the generalized encoder based on results of processing thesecondary data stream with the variable length coding information andthe feedback data.
 40. An apparatus for manipulating compressed digitalvideo forming manipulated compressed digital video, the manipulatedcompressed digital video being a manipulation of data from at least oneof a plurality of compressed digital video sources and destinations, theapparatus comprising: at least one video input module, each video inputmodule of the at least one video input module being operative to receivea compressed video input signal that belongs to one of the compresseddigital video sources depending on the required manipulation, to decodethe compressed video input signal for generating a decoded video datastream and to transfer the decoded video data stream to a commoninterface; at least one video output module, each video output module ofthe at least one video output module being operative to grab the decodedvideo data stream from the common interface, to encode the decoded videodata stream into a compressed video output stream, and to transfer thecompressed video output stream to at least one destination of theplurality of destinations; and a common interface forming anon-dedicated connection for routing the decoded video data stream fromat least one video input module to at least one video output module;wherein there is no dedicated logical relation or connection between theat least one video input module, and the at least one video outputmodule and the apparatus has a configuration in which the non-dedicatedlogical connection depends on the current needs of a currentmanipulation, whereby use of the configuration improves resourcesallocation of the apparatus.